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Abstract 

This mini-paper presents a fast and simple algorithm to compute the projection onto the 
canonical simplex A". Utilizing the Moreau's identity, we show that the problem is essentially 
a univariate minimization and the objective function is strictly convex and continuously differ- 
entiable. Moreover, it is shown that there are at most n candidates which can be computed 
explicitly, and the minimizer is the only one that falls into the correct interval. 

Keywords. Nonlinear programming, projection onto a simplex, Moreau's identity, proximity 
operators. 

1 Introduction 

The computation of projection onto simplex appears in many imaging and statistics problems, 
such as multiphase segmentation, diffusion tensor imaging, etc. The problem can be described 
as follows: for any given vector y € M", the projection of y onto the simplex A" is to solve the 
minimization problem 

X = arg min \\x — yll, (1) 
where || • || denotes the regular Euclidean norm and A" is the canonical simplex defined by 

A" := |x = (xi,- • • ,Xnf G M" : < < l,i = I,-- - ,n, and = l| . (2) 

For example, A-^ is the line segment between two points (1,0) and (0, 1) in M^, and A^ is the triangle 
in with vertexes (1,0,0), (0,1,0) and (0,0,1). The solution is nontrivial and it does not yield 
an explicit form. Early attempts include iterative projections onto affine subspaces of [3], and 
Lagrangian method that shows the optimal solution is {y — 1)-|_ for some t followed by the iterative 
bisection algorithm to find this t, etc. In addition, it is worth pointing out that there are a large 
number of literatures projection onto the ball {x G : < 1}, e.g. 121 [5], which has has 

similar formulation and is in general easier to solve. 

In this report, we present a novel, faster and simpler numerical algorithm to solve ([T|) for any 
dimension n > 2. 
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2 Algorithm 

The minimization problem in ([1]) can be rewritten as 



min7r(x) + , y = (yi, • • • , G M", (3) 
where vr is the indicator function of A" defined by 

tt(x) = S ,1 . (4) 

[ +00 otherwise . 

As vr is a proper and convex function due to the fact that A" is close and convex, we know ([3]), and 
hence ([1]), has a unique solution. 

In the literature, the solution to ([3]) is also denoted by the Moreau's proximity operator [3] 

(I + dTT)~^{y) := arg min ■7r{x) -\ — H^; — (5) 

which satisfies the Moreau's identity [1] 

y=(I + d7r)-Hy) + {I + d7T*)-\y), (6) 
where n* is the Fenchel transform of tt defined by 

TT*{y) := sup {z, y) - ■k{z). (7) 

Note that vr* in ([7]) has a simple form if tt is the indicator function of A" defined in ([4]), i.e. 

■K*{y) := sup {z, y) - 7r(z) = sup (z, y) 

" (8) 
= max^z.y.= maxM. 

j=i — 

So 7r*(y) is just the largest component of y. Based on ([6]), it is suffice to compute the Moreau's 
proximity operator of tt* 



(/ + Svr*) (v) = arg min 7r*(2) H — llz — -hII 

• / / l|2 

= argmin< max-^z,4H — \\z — y\\ 



(9) 



and then (j5]) can be obtained by 

(/ + d7:)-\y) = y-{I + dT:*)-\y). (10) 

To find the solution of ([9|), we first sort the components oi y = (yi, • • • , G MP in the 

ascending order as < • • • < y{n)- Then we know the minimization problem in ([9]) can be written 
as 

min < max {zj} H — \\z — y\\^ > = min min < t H — \\z — : max {zj} = w (11) 



2GM" I l<i<n 2 I teM zeK" I 2 l<i<n 
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by introducing the new variable t. For any fixed t, the inner minimization problem on the right of 
(HID is 



min ■{ t + — 11^ — y|P : max {zi} = t 



l<i<n 



and the minimizer z{t) (as it depends on t) is obviously 

t \iyi>t 



i = 1 



mh - 

and the minimum value of (jl2p is 

^ n 

t + ^Y.i^-yif if 2/(1) 



, ••• ,ri, 



(12) 



(13) 



f{t) ■.= t + -\\z{t)-y\? 



i + ^ Y it- y(j)f if < t < y(i+i), i = !,■■■ ,n - 1 



(14) 



j=i+i 



^ t + i;it-y(n)f 



if t > 



y(n) 



Note that / is well-defined at y(i),--- ,y(n) ^'S shown in Lemma 12.11 below. Therefore, (llip is 
equivalent to the univariate minimization problem 



min f[t) 



(15) 



where f{t) is defined in ([Till . 

Note that / in (I14p is piecewise quadratic, and hence is piecewise convex and smooth. Moreover, 
the following lemma shows that / G 



Lemma 2.1. The function f defined in (|14p is continuous on R, and its derivative f exists and is 
also continuous on M. 



Proof: According to (jl4p . for all i = 1, • • • , n, there are 

^71 1 " 

lim. /(t) = y(i) + - ^(y(i) - y(j)f = y{i) + ivn) - y{j)f = fiy{i)) 



(i) 



j = i+l 



and 



lim. f'{t) = 1 + ^iy(i) - y(j)) = 1 + 5Z ~ ^0)) = ^^"1 /'(?/(*))> 



(16) 



(17) 



which prove the lemma. □ 

Based on Lemma |2.H we can obtain the derivative /' as 



t^y, 



(i) 



fit) = 



l + 2Z(t-2/i) if t< 2/(1) 



1 + Y it- y{j)) if y{i) < t < y(i+i), i = !,■■■ ,n - 1 



(18) 



j=i+i 

I i + (i-y(n)) 



if * > 2/(n) 



Note that /' is well-defined at the points • • • ,y(„) in ([18]) as it is continuous. Based on the 
discussion above, we have the theorem that restricts the search in n candidates and obtains the 
solution to ([1]) using the only candidate that falls into the correct interval. 



3 



Theorem 2.2. For any vector y € M^, the projection of y onto A" as in ([T]) is obtained by the 
positive part of y — t: 

x = {y-i)+. (19) 
where t is the only one in {ti : i = 0, • • • , n — 1} that falls into the corresponding interval as follows, 

U ■■= — ^ , i = 0, • • • ,n - 1, where ti < and y(j^ < ti < y(i+i), i = !,■■■ ,n - 1. 

(20) 

Proof. Based on Lemma l2.H we know / is piecewise quadratic and / £ C^(M). As ([3]) has a 
unique minimizer, we know that Q has a unique minimizer as weh, and hence the optimal t is 
unique. Therefore, there is only one single t that has vanish derivative i.e. f'{t) = 0, and this t is 
the minimizer of mmt£R f (t) . 

Now we compute the possible choices for the optimizer t. Note that / is piecewisely defined, 
and according to (|18p . there are at most n points that have vanish derivative. It is easy to check 
that they are those shown in (j20p . 

However, since the optimal t exists and is unique, we know that there is one and only one of 
{ti : i = 0, ■ ■ ■ ,n — 1} that can be the optimal t. In another words, there is only one ti that falls 
into the "correct" interval and hence is the optimal choice of t. 

Once t is obtained, we have the minimizer of (jlip as z{t) where z(-) is defined in (|13p . and hence 
obtain the solution x to based on the Moreau's identity (fTO]l : 

which implies that x = (y — t)_|_. This completes the proof. □ 

Based on Theorem [221 we only need to find the ti in ()20p that falls in the corresponding interval, 
and claim it as the optimal t for ()2ip . This procedure is described as in Steps 2-4 in the Algorithm 
1. An interpretation of such procedure is as follows. Suppose we start with a very large t, namely 
t > y(n)) and let t go towards negative. First, we can see f'{t) = l-|-(t — y(„)) > 1 > if t > y(„) and 
hence the optimal t cannot occur in oo). As /' is continuous and f'{y(n)) = 1 > 0, we know /' 
is positive near As / is quadratic in [y(n-i)! y(n)), this also implies that the optimal t, if exists 
in this interval, can only be := — 1 based on (fT8]l . Due to the existence and uniqueness of 
t, we can surely accept t = if < tn-i < y{n)i or simply tn-i > y(n-i) (since obviously 

tn-i < y{n))- If tn-1 < y(n-i)i we know the optimal t is not achieved yet and hence /'(y(n-i)) > 
due to the fact that / is quadratic in y(n))- Repeating the similar argument, we can accept 

i = tn-2 ■■= (y(n-i) + y{n) " l)/2 if y(n-2) < tn-2 < ^(n-i), or simply tn-2 > y{n-2) as we know 
f{t) is quadratic and keeps increasing near y(„_i) in this case (hence tn-2 cannot be equal to or 
larger than y(„_i)). Based on this analysis, we can repeat at most n — 1 steps of such "compute ti 
- compare to y^j)" processes (i = n — 1, n — 2, • • • , 1) until t is found. If t is still not found after 
n — 1 steps, then it must be to := iYlJ=i Vj ~ '^)/^- 

Completed with the last step x = (y — t)^, the algorithm is summarized in Algorithm 1. 



3 Numerical Examples 

We manually computed the projections of several examples of y in low-dimensional (2D and 
3D) cases using Algorithm 1, and the results turned out to be exact as expected. For more general 
cases, it is usually nontrivial to demonstrate the correctness of Algorithm 1 numerically. 
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Algorithm 1 (projsplx). Projection of y S M" onto the simplex A*^. 

1. Input y = (yi,-- - ,y„f gM"; 

2. Sort y in the ascending order as < • • • < ?/(„), and set i = n — 1; 

3. Compute ti = — ^"'^^/^^ — • If U > y^^i^ then set i = ti and go to Step 5, otherwise set i i — 1 
and redo Step 3 if i > 1 or go to Step 4 if i = 0; 

4. Set i= ^^kllizl- 

n ' 

5. Return x = {y — as the projection of y onto A". 



2^ 




(a) 2D case (b) 3D case 

Figure 1: Projections (blue) of 1024 random points (red) onto simplexes A^ (left) and A^ (right). 

For completeness of this report, we show several numerical examples of Algorithm 1. The 
algorithm is implemented in C and compiled in MATLAB (Version R2010b) using mex function. 
The computations are performed on a Lenovo ThinkPad laptop with Intel Core 2 CPU at 2.53GHz, 
3GB of memory, and GNU/Linux (Kernel version 2.6.35) operating system. 

We first generate 1024 2D points from A(0,/2) using the MATLAB function randn(2 , 1024) 
and project them onto A^ using Algorithm 1. The results are plotted in Figure [l(a)[ A similar 
test projects 1024 samples from A(0,0.5/3) to A^ and plots the results in Figure [l(b)[ Note that 
both tests are not designed to show that Algorithm computed the correct projections, since the 
correspondences are not shown. On the other hand, these two tests demonstrate that the outputs 
X are indeed located at the simplex (note that we did not check anywhere in Algorithm 1 that x is 
on the simplex). 

The next test shows the CPU time of projections of 2^^ = 65, 536 n-dimensional points draw 
from A(0, /„) for n = 2, 3, • • • , 50. The results are plotted in Figure El The CPU time has the trend 
of increase as n becomes larger. Interestingly, the increasing rate is rather low. For example, for 
n = 5 to n = 50, the CPU time used only changes from 1.88s to 2.52s, for which the difference seems 
rather minor compared to the significant changes in computational complexity of the problem. 
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Figure 2: CPU time (in seconds) of projections of 65, 536 n-dimensional points drawn from A^(0, In)- 
The tests are carried on for n = 2, • • • ,50. 



4 Concluding Remarks 

In this paper we propose a fast and simple algorithm projsplx as shown in Algorithm 1 that 
projects an n-dimensional vector y to the canonical simplex A". The computation comprises of the 
sort of components of the input y and at most n simple "compute-compare" processes. The solution 
is exact and the algorithm is extremely easy to implement. The MATLAB/C code is provided on 
the following websites for public use. 



Author's website: http : //www . math . uf 1 . edu/~xye/codes/pro j splx . zip 



Matlab Central: http : //www . mathworks . com/matlabcentral/f ileexchange/30332| 
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